Figure 1. Leonardo da Vinci Anatomical Drawing

Figure 1. Leonardo da Vinci Anatomical Drawing


Introduction


Deep Learning is making a huge impact in the areas of computer vision and natural language processing. Real-world applications include medical imaging, AI driven healthcare, and personalized medicine.

In this notebook, we will develop and evaluate a Convolutional Neural Network model using Keras to solve a regression problem. The model will determine the skeletal age on a curated set of pediatric hand radiographs. Given x-ray images of human hands, we want to build a model that accurately predicts the patients’ age in months.

The code for the project is available on Github.


The RSNA Bone Age Dataset


Figure 2. RSNA 2017 Pediatric Bone Age Challenge

Figure 2. RSNA 2017 Pediatric Bone Age Challenge

The Radiological Society of North America (RSNA) Bone Age dataset is from the RSNA 2017 Machine Learning Challenge. The dataset is also available to download on Kaggle.

rsna-bone-age.zip file contents:


Set up Environment


The program is written in R. Load packages and import data files.

library(pacman)
pacman::p_load(imager, dplyr, rsdepth, EBImage, dbscan, fpc, png, ggplot2, gridExtra, grid, png)
pacman::p_load(keras,pbapply)

label_data <- read.csv("boneage-training-dataset.csv")
label_data.list <- split(label_data, seq(nrow(label_data)))


dir_path = "/boneage-training-dataset/"
if (!dir.exists(dir_path)){
  load("myEnvironment.Rdata")
} else{
  image_names = list.files(dir_path)
}

source("xray_imageSegmentation.R")

Data Overview


We can get a sense of the data and identify potential challenges by looking at a small subset.

Potential Issues


Preprocessing: Image Segmentation


The goal of preprocessing is to get a workable dataset from which we can derive meaningful predictions. The code used for preprocessing is available on the Github repository here. The script was run on all images in the training dataset. The following is a demonstration.

Original X-Ray


The resolution of the original image is 2247x2200 pixels. Notice the label towards the bottom left corner of the x-ray.

Edge Detection


Low contrast images need to be preprocessed in order to be suitable for edge detection1. For these, I applied Local Contrast Normalization12, which is a technique to enhance contrast in low contrast images.

A Sobel Edge Detection was performed on the contrast-normalized image using an Illumination-Gradient based Algorithm12.

Extracting Contours


After detecting the edges of the x-ray, the contours of the image were mapped onto the Cartesian xy-coordinate plane. The contours are composed of densely packed \(x\) and \(y\) data points. The top two most densely packed contours are shown below.

The next task in preprocessing was to extract the data points that coorespond to the hand. I evaluated several clustering algorithms to decide which one would perform best.

I found that the density based spatial clustering algorithm ( DBSCAN ) was best suitable for this analysis. DBSCAN is an unsupervised learning clustering algorithm that uses the spatial density to detect data points that are geometrically closer together. DBSCAN is an excellent algorithm for detecting shapes and patterns in cluster data.


DBSCAN, Density Based Clustering


Before applying DBSCAN to the image, the contours of the hand were partitioned into its own cluster. For this, I implemented a helper function called findWrist() to partition the hand from of the image.

DBSCAN was used to obtain the clusters seen below.

The cluster of interest is in teal. I implemented an algorithm to obtain the x-min,x-max,y-min, and y-max coordinates of this cluster.


Segmentation

Final Cropped, Resized, Padded X-Ray


The coordinates obtained from the previous step were used to crop the image. The final image was resized and padded to 500x500 pixels. Two things to note here: (1) The aspect ratio and quality of the image were retained. (2) The label from the original x-ray was cropped from the image. This is an important step as artefacts such as these can create noise when training the model.



Model Selection, Feature Engineering


Figure 3. Case courtesy of Dr Aneta Kecler-Pietrzyk, Radiopaedia.org, [rID: 53260](https://radiopaedia.org/cases/computer-assisted-bone-maturity-measurement-normal-7-year-old-1)

Figure 3. Case courtesy of Dr Aneta Kecler-Pietrzyk, Radiopaedia.org, rID: 53260

Images can be represented numerically as pixel intensity values in three color channels: red, green, and blue. The x-rays in this project are grayscale images so there is only color channel. Since images are inherently matrices of pixel intensity values, we can use this as a basis to process the images and build a model.

With this in mind, there are two general models we can use to predict skeletal bone maturity.

Regression Model

The first approach is to identify and calculate features based on the image, and then train a Regression Model on those features. For instance, one feaure we could measure is the relative length (in pixels) of each bone in the x-ray (Figure 3). Another feature we could consider is the space of the joints in the hand (this connective cartilage between the bones is known as the epiphyseal growth plates – highlighted in Figure 4). An advantage of this approach is that it approximates how bone age is assessed in the real world. Trained physicians will typically look at a combination of these features to predict skeletal bone maturity.

Figure 4. Photo Cred:[Source](https://www.alamy.com/stock-photo-hand-x-ray-12-year-old-male-two-views-with-the-epiphyseal-growth-plates-26897439.html)

Figure 4. Photo Cred:Source

This is a solid approach, however working out the code to do the image analysis and generate the features is actually quite difficult. This is an area of active research. Much progress has been made.

Deep Learning Model: Convolutional Neural Networks

The second approach is to build a Convolutional Neural Network (CNN). CNN is a deep learning technique commonly applied to images. Convolutional Neural Networks greatly simplify the task of feature engineering. It identifies features for training the model by applying kernel convolutions to the image. Sobel edge detection, Gaussian blur, and Laplacian sharpening are examples of kernel convolutions.

CNNs are typically used for image classification. We will build a CNN for regression by specifying a linear output activation at the end of the fully connected network, and by setting the appropriate performance parameters in the keras::compile() function.


Keras Functional API

Keras is a deep learning library that wraps the efficient numerical libraries Theano and TensorFlow.

Figure 5. Labels for Supervised Learning: The Pixel Values of the Image, Age, Gender

Figure 5. Labels for Supervised Learning: The Pixel Values of the Image, Age, Gender


Model Preparation

We will use the Keras Functional API to build the model instead the Keras Sequential Model. The Keras Functional API allows you to define complex models with multiple inputs – which is what we have. The inputs available for supervised learning are the pixel values of the image, and the patient’s age and gender (Figure 5).

Extract Labels

The first function below extracts the labels we’ll be using to train the model. The second function formats the data for the Convolutional Neural Network.

desired_res = 100    # the images will be 100x100 resolution

# extract image pixel values, patient's age and gender. resize image to 100x100 resolution
extract_features <- function(image_names, label_data.list, label_data, dir_path, res){
  ## extract pixel values for each image, store in a vector, row-wise
  feature_list <- pblapply(image_names, function(imgname) {
    img <- load.image(file.path(dir_path, imgname)) %>% EBImage::resize(w=res,h=res)
    img_matrix <- as.matrix(img)
    img_vector <- as.vector(t(img_matrix))
    return(img_vector)
  })

  ## row bind the list of vectors into matrix; each row is an image, each column is a pixel value
  feature_matrix <- do.call(rbind, feature_list) %>% as.data.frame()
  ## Set column names
  names(feature_matrix) <- paste0("pixel", c(1:ncol(feature_matrix)))

  ## extract ages, one-hot encode genders
  idx <- gsub("([0-9]+).*$", "\\1", image_names)
  age <- vector("integer", length(idx))
  gender <- vector("integer", length(idx))
  for (i in seq(length(idx))){
    num <- which(label_data$id %in% idx[i])
    age[i] <- label_data.list[[num]]$boneage %>% as.numeric()
    gender[i] <- label_data.list[[num]]$male %>% as.logical()
  }

  return (list(X = feature_matrix, age = age, gender = gender, file_name = image_names))
}

# format data structure for CNN
prep_data_for_CNN_model<-function(df, desired_res){
  dat_arr <- t(df$X)
  dim(dat_arr) <- c(desired_res, desired_res, nrow(df$X), 1)
  # Reorder dimensions
  dat_arr <- aperm(dat_arr, c(3,1,2,4))
  return(dat_arr)
}


Housekeeping

There are 12172 total images in the dataset. I removed 439 outliers after image preprocessing.

To get a list of the filenames of the images I removed, enter the following:

  • load("myEnvironment.Rdata")
  • setdiff(list.files("/boneage-training-dataset/"), image_names).
print(length(image_names))
## [1] 12172

Set up Train and Test Data

Here we split the data into a 90/10 training/testing split. We’ll be training and validating on over 10,900 images, and testing the model on 1200+ new images.

# the normalized images
dir_path = "/normalized_images/"

if(!dir.exists(dir_path)){
  load("myEnvironment.Rdata")
} else {
  image_names = list.files(dir_path)

  # Set up Training/Testing Split
  set.seed(42)  # set seed for data reproducibility
  training_images <- sample(image_names,length(image_names)*0.90)
  test_images <- setdiff(image_names,training_images)

  # Extract the Labels: Pixel Intensities, Genders, Ages
  trainData <- extract_features(training_images, label_data.list, label_data, dir_path, desired_res)
  testData <- extract_features(test_images, label_data.list, label_data, dir_path, desired_res)
  trainGenders <- trainData$gender %>% factor()
  trainAges <- trainData$age
  train_array <- prep_data_for_CNN_model(trainData, desired_res)
  test_array <- prep_data_for_CNN_model(testData, desired_res)
}


Sanity Check

We can perform a sanity check to ensure that the image pixel data was stored properly. The CNN will perform convolutions on the pixels of these images.

# sanity check
par(mai=rep(0.05,4))
layout(matrix(1:3, ncol=3, byrow=T))
hands = c(35,52,107)
for (i in hands){
  hand <- train_array[i,,,]
  image(t(apply(hand, 2, rev)), col = gray.colors(12), axes = F, asp=1,xaxt='n', yaxt='n')
}

Build Multi-Input Model, Model Architecture

To use the functional API, we need to specify the input and output layers and then pass them to the keras::keras_model() function.

The main input to the model are the pixel values of the image. This matrix is fed to the convolutional neural network (CNN).

Layers in the CNN

  • Convolution Layer: This layer identifies features in the image by performing kernel convolutions on the pixel values.
  • Pooling layer: This layer reduces the amount of parameters and computations required in the neural network by performing a down-sampling operation on the image – this helps reduce processing time and help prevent overfitting. This layer is usually placed after a convolution layer.
  • Dropout Layer: This layer ignores a random set of activations during a forward and backward propagation by setting them to zero – this again helps prevent overfitting the model.
  • Flattening Layer: This layer flattens the feature maps from the previous layers into a 2D array (a column of features).

The model will also have an auxiliary input: the patient’s gender. The auxiliary input layer is merged with the main input layer. This merged layer is then fed to the neural network of the main output layer.

Here’s what the model looks like:

Let’s implement it with the functional API.

# main input layer, the image pixel values
input_img = layer_input(shape = dim(train_array)[2:4], dtype = 'float32', name="input_img")

# convolutional neural network
convNet_input <- input_img %>%
  layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu') %>%
  layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  layer_dropout(rate = 0.35) %>%
  layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
  layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_dropout(rate = 0.35) %>%
  layer_flatten()

# auxiliary input layer, gender
auxiliary_input <- layer_input(shape = c(2), dtype = 'float32', name = 'auxiliary_input')

# main output layer, age
main_output <- layer_concatenate(c(convNet_input, auxiliary_input)) %>%
  layer_dense(units = 256, activation = 'relu') %>%
  layer_dropout(rate = 0.35) %>%
  layer_dense(units = 1, activation = 'linear', name = 'main_output')

Here we define a model with two inputs and one output.

model <- keras_model(inputs = c(input_img, auxiliary_input), outputs = main_output)

Compile Model

Here we compile the model. The main output layer will be trained to predict age. Since the value we’re predicting is numeric and we’re dealing with a regression problem, the model will be supervised via the loss function Mean Square Error (mse). The performance metric of age will be Mean Absolute Error (mae).

model %>% compile(loss = 'mse',
                   optimizer = optimizer_rmsprop(lr = 0.001),
                   metrics = 'mae')

Model Summary

We can look at a summary of the model. The summary gives you an idea of how the model is constructed, the layer order, the shape of the input and output layers, and the number of parameters. These metrics can be used for troubleshooting and/or optimizing the model later.

summary(model)
## ___________________________________________________________________________
## Layer (type)            Output Shape     Param #  Connected to
## ===========================================================================
## input_img (InputLayer)  (None, 100, 100, 0
## ___________________________________________________________________________
## conv2d_1 (Conv2D)       (None, 98, 98, 3 320      input_img[0][0]
## ___________________________________________________________________________
## conv2d_2 (Conv2D)       (None, 96, 96, 3 9248     conv2d_1[0][0]
## ___________________________________________________________________________
## max_pooling2d_1 (MaxPoo (None, 48, 48, 3 0        conv2d_2[0][0]
## ___________________________________________________________________________
## dropout_1 (Dropout)     (None, 48, 48, 3 0        max_pooling2d_1[0][0]
## ___________________________________________________________________________
## conv2d_3 (Conv2D)       (None, 46, 46, 6 18496    dropout_1[0][0]
## ___________________________________________________________________________
## conv2d_4 (Conv2D)       (None, 44, 44, 6 36928    conv2d_3[0][0]
## ___________________________________________________________________________
## max_pooling2d_2 (MaxPoo (None, 22, 22, 6 0        conv2d_4[0][0]
## ___________________________________________________________________________
## dropout_2 (Dropout)     (None, 22, 22, 6 0        max_pooling2d_2[0][0]
## ___________________________________________________________________________
## flatten_1 (Flatten)     (None, 30976)    0        dropout_2[0][0]
## ___________________________________________________________________________
## auxiliary_input (InputL (None, 2)        0
## ___________________________________________________________________________
## concatenate_1 (Concaten (None, 30978)    0        flatten_1[0][0]
##                                                   auxiliary_input[0][0]
## ___________________________________________________________________________
## dense_1 (Dense)         (None, 256)      7930624  concatenate_1[0][0]
## ___________________________________________________________________________
## dropout_3 (Dropout)     (None, 256)      0        dense_1[0][0]
## ___________________________________________________________________________
## main_output (Dense)     (None, 1)        257      dropout_3[0][0]
## ===========================================================================
## Total params: 7,995,873
## Trainable params: 7,995,873
## Non-trainable params: 0
## ___________________________________________________________________________

Train Model

Here we assign the list of inputs to variable x and the output to variable y. The model will be trained for 100 epochs. We’re validating on 20% of the training images.

val_split = 0.2
history <- model %>% fit(
  x = list(input_img = train_array, auxiliary_input = to_categorical(trainGenders)),
  y = list(main_output = trainAges),
  epochs = 100,
  batch_size = 64,
  validation_split = val_split
)

The model has completed training. We can visualize the model’s training progress using the metrics stored in the history variable.

str<-paste("Training History: Trained on", floor(nrow(train_array)*(1-val_split)),
           "IMGs, Validated on ", floor(nrow(train_array)*val_split), "IMGs")
plot(history) + ggtitle(str)

Interpreting the Training History

Intially, the error loss function (mean square error) and mean absolute error (MAE) are quite high but eventuallly decrease as we train the model.

There is no evidence of overfitting. The validation error is low and slightly higher than the training error. This indicates a good fit in the model1. Intuitively, the training error should be lower than the validation error because the training data is used to train the model. The model is expected to perform better on “learned” data. The validation error should be slightly higher than the training error because the validation data is “unknown” to the model.

Predict

We can evaluate the model’s performance on the test data using the keras::evaluate() function.

model %>% evaluate(list(test_array, to_categorical(testData$gender)), testData$age)
## $loss
## [1] 340.2484
##
## $mean_absolute_error
## [1] 13.6809

MAE tell us how big of an error we can expect in the forecasted value from the actual value on average. The MAE is within a degree of accuracy of 14 months. The MAE for the model is really good.

We can make predictions on the test data using the predict() function.

predictions <-  predict(model, list(test_array, to_categorical(testData$gender)))

Evaluate Results


Predicted vs Actual Bone Age

We can make a “Predicted vs Actual” plot to visualize the accuracy of the regression model.

predictedAges <- data.frame(x=testData$age, y=predictions)
actualAges <- data.frame(x=testData$age, y=testData$age)

ggplot(data=predictedAges, aes(x, y)) +
  geom_point(aes(color="predicted")) +
  geom_line(data = actualAges, aes(color = "actual")) +
  labs(title=paste("Predicted Bone Age vs Actual Bone Age:", nrow(predictedAges), "Test Cases"),
       x="Actual Age (Months)", y = "Predicted Age (Months)") +
  theme(plot.title = element_text(hjust = 0.5), legend.position=c(0,1), legend.justification = c(-0.5,1.3),
        legend.background = element_rect(color = 'black', fill = 'white', linetype='solid')) +
  theme(legend.title=element_blank()) +
  scale_color_manual(values=c("blue", "red")) +
  guides(color = guide_legend(reverse = T, override.aes = list(linetype = c("blank","solid"), shape = c(16,NA)) ) )

The Eye Test

The predicted ages are fairly close to the actual bone ages. The model performs well from young ages to old ages. We can visualize a small subset of the predictions.

The results below show that the model performs well on low and high contrast images, slightly rotated images, images with and without a label, images with various hand placements, etc. This demonstrates that the model is robust. Preprocessing improved the model's accuracy and reduced noise.

v <- as.integer(gsub("([0-9]+).*$", "\\1", testData$file_name))
set.seed(42)
picks <- sample(v, 12)

if (dir.exists(dir_path)){
  imgs<-lapply(picks , function(i) {
    readPNG(paste0(dir_path,i,".png")) %>% rasterGrob()
  })
}

p <- lapply(seq(1,length(picks)), function(i) {
  filename <- paste0(picks[[i]],".png")
  id = which(testData$file_name %in% filename)
  qplot(geom="blank") + annotation_custom(imgs[[i]]) +
    labs(title=paste0("Age: ",testData$age[[id]], "months\nPredicted Age: ",
                      as.integer(predictions[[id]]), "months\nFilename: ", filename)) +
    theme(plot.title = element_text(hjust = 0.5)) +
    theme(axis.title.x=element_blank()) +
    theme(axis.title.y=element_blank()) +
    theme(axis.line = element_line(colour = "black"),
          panel.grid.major = element_blank(),
          panel.grid.minor = element_blank(),
          panel.border = element_blank(),
          panel.background = element_blank()) +
    theme(plot.title = element_text(size=10))
}) %>% invisible()

n <- length(p)
nRow <- floor(sqrt(n))

do.call("grid.arrange", c(p, nrow=nRow))


Conclusion


We successfully built a model that predicts bone age from x-ray images. The mean absolute error (MAE) of the model is within a degree of accuracy of 14 months. Also, the data points in the “Predicted vs Actual” plot are close to the diagonal. This indicates that the predictions are fairly accurate. There was no evidence of overfitting.

The model can be massively improved with

Great references for improving the model performance:


Closing Remarks


This was a capstone project that I worked on while completing the Deep Learning Specialization on Coursera. I enjoyed working on this project because it showcased the types of problems we can solve using machine learning.

Project Motivation

I wanted to work on a healthcare-related real-world dataset to solve a complex problem. I like the idea of improving healthcare through technology. The deceptively simple yet non-trivial task of predicting age from x-ray imaging data was perfect in this regard. It’s a regression problem on unstructured data. I used supervised and unsupervised learning approaches to build the model.

Domain Knowledge and Intuition

The first step was to identify features I could use to predict age. I did some initial reading to gain domain knowledge and a bit of intuition of what was needed to build a predictive model. I felt it was important to understand how trained physicians predict age from x-rays so that I would be able to automate the process and predict bone age in scale.

I learned that physicians look at the epiphyseal growth plates in the hand as well as the relative length of the bones to predict the person’s age. Once I identified these features, I brainstormed ideas on how I was going to extract them to train the model. I quickly realized that a deep learning approach was the way to go.

Research and Development

A lot of research went into the image segmentation part of the project. I probably bookmarked over 30 webpages on image analysis and preprocessing, computational geometry algorithms, clustering algorithms, among other topics related to computer vision. I’m happy with the results. The extra preprocessing and subsampling improved the model's accuracy.

The last part of the project was to build the convolutional neural network with the Keras functional API. I applied concepts I learned in the Deep Learning Specialization to build the CNN: how to structure the model's architecture and network topology, hyperparameter tuning, regularization, etc. I discuss the model in detail in the sections above. I was pleased when I finally got the model to work. It was very rewarding.

Thanks for reading. Hopefully this notebook serves as a guide on how to use the Keras functional API for regression problems.







\(\blacksquare\)